We read with interest the recent paper by Müller et al1 that aims to provide convincing reference values for robotic distal pancreatectomy (RDP) from 16 international high-volume centers. We would be grateful if the authors would clarify the center eligibility criteria and the numbers used in the analyses of results. We commend the authors on reporting the total yearly volumes of pancreatic resections, distal resections and RDP, and the number of surgeons performing RDP in the center-specific caseload found in Supplemental Table 2 (see https://links.lww.com/SLA/E53). This is in line with the Miami evidence-based guidelines on minimally invasive pancreas resection (MIPR)2 that center volume and annual individual surgeon volume affect outcomes. We would be interested to know the centers’ MIPR-specific volume as suggested by the guidelines. Based on the presented information, there is a wide variation among the centers in the proportion of RDP to total distal resections (range: 21%–100%) and the average volume of RDP per surgeon (range: 4–15 per year). Whether the MIPR-specific volume, proportion of MIPR or RDP, or surgeon volume correlated with center-specific outcomes and whether these variables interacted with the proportion of benchmark cases per center would be noteworthy. From the available data, 2 of the 3 reported center eligibility criteria, >50 annual pancreatic procedures per center and >20 RDP (Table 1 of the paper), were not met by 2 centers. Center 13 lists 40 pancreatic resections per year and center 1 had less than 15 RDP, as detailed in Supplemental Table 2 (see https://links.lww.com/SLA/E53) and Figure 1 of the paper, respectively. In contrast, applying the study protocol center eligibility criteria of at least 10 RDP per year over the last 3 years (https://benchmark4rdp.org/protocol/) excludes 6 of the 16 centers (38%) and, on visual inspection of Figure 1 of the paper, around a quarter of benchmark cases. Centers 1 to 5 each had less than a total of 30 cases (Fig. 1 of the paper), and center 16 lists 8 RDP per year (Supplemental Table 2, see https://links.lww.com/SLA/E53). We wonder if sensitivity analyses applying strict adherence to the published center eligibility criteria or per protocol may have resulted in different benchmark values. Unlike other surgical benchmark studies that report a precisely defined study period,3,4 the open-ended and heterogeneous “start of RDP program” introduces another confounder (differences in time5,6) that may potentially be controlled for. In the specific example of centers 15 and 16, based on the available data, the number of RDP may have been accumulated over a period of less than 2 years and over 12 years, respectively. Notably, the center eligibility criteria defined by the study protocol (at least 10 RDP per year over the last 3 years) would have mitigated such a bias. There appears to be an overlap in cases considered low-risk and high-risk. The authors report a total of 755 cases—433 (57%) were low-risk, benchmark patients, and 455 (60%) were high-risk, nonbenchmark patients. “After removing the first 10 cases from each center to minimize the impact of the learning curve, 345 patients (46%) were included in the final benchmark analysis.” This would result in 72 patients among the 160 from the learning curve being high-risk, nonbenchmark patients. In assessing the correlation between the proportion of benchmark cases and center-specific outcomes (Fig. 2 in the paper), were any centers omitted (and for what reason)? Figure 2A,B appear only to include 15 centers, and 2C only 14 centers. Using the reported proportions in Figure 1 from the paper, there are 4 centers with between 50% and 60% of benchmark patients: center 7 (52%), 1 (54%), 13 (55%), and 9 (56%). However, there appear only to be 3 clear dots, representing 3 centers between 50% and 60% in Figure 2A–C. In addition, there appear only 2 clear dots between 40% and 50% in Figure 2C, where 3 centers are reported to have between these proportions: center 3 (42%), 5 (46%), and 12 (47%). Was survival and recurrence-free survival calculated from only patients with a final pathological diagnosis of pancreatic ductal adenocarcinoma (PDAC), all malignant tumors or the entire benchmark cohort? Under “Oncologic outcome for patients with PDAC” in Supplemental Table 1 (see https://links.lww.com/SLA/E53), “recurrence status, n (%)” is reported as “no. recurrence 317 (92),” “locoregional recurrence 8 (2),” “liver 11 (3),” and “other recurrence or w/o known location 9 (3).” When using only the reported number (n = 62) of PDAC as the denominator, recurrence status would be 55%, 13%, 18%, and 15%, respectively. Thus, it seems that this study inadvertently highlights the perpetual challenge of using volume in investigations of the quality of surgery and patient outcomes, particularly for minimally invasive surgery (MIS). There is no standardized definition of what constitutes a high-volume center7 and even within a study, set criteria may be inconsistently applied.1,3 The use of national and international registries, such as the European Consortium on Minimally Invasive Pancreatic Surgery (http://www.e-mips.com/registry), may help include sufficient center numbers required given the high variability within key indicators8 (range of median across the centers), for example, complications at 3 months (0–69%) and clinically relevant pancreatic fistula (0–45%). The authors propose the proportion of nonbenchmark cases as a novel marker of quality in expert centers. One concern regarding using such a metric is that it may perversely incentivize inappropriate risk-taking to achieve a greater proportion of high-risk patients. We propose that MIS provides an opportunity to shift from the volume-outcome quality paradigm and return to the benchmarking of processes of care associated with high-quality care and outcomes. Compared with open surgery, the comparative ease of video recording the operation facilitates the measurement and research of technical factors and skills associated with patient outcomes.9,10 Future advances in artificial intelligence may also catalyze the use of video review, currently a laborious process undertaken by blinded experts.9,10 A renewed focus on the previously opaque quality processes of intraoperative care could broaden the involvement of centers and surgeons outside the “high-volume” cadre in quality improvement initiatives that will ultimately benefit a significant number of patients and surgeons worldwide. ACKNOWLEDGMENTS Phillip Chao is the recipient of a Health Research Council of New Zealand Clinical Research Training Fellowship (Reference number: 22/034).